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Abstract Coronary artery disease (CAD) is a complex disease driven by myriad interactions 
of genetics and environmental factors. Traditionally, studies have analyzed only one disease 
factor at a time, providing useful but limited understanding of the underlying etiology. 
Recent advances in cost-effective and high-throughput technologies, such as single 
nucleotide polymorphism (SNP) genotyping, exome/genome sequencing, gene expression 
microarrays and metabolomics assays have enabled the collection of millions of data points 
in many thousands of individuals. In order to make sense of such 'omics' data, effective 
analytical methods are needed. We review and highlight some of the main results in this 
area, focusing on integrative approaches that consider multiple modalities simultaneously. 
Such analyses have the potential to uncover the genetic basis of CAD, produce genomic risk 
scores (GRS) for disease prediction, disentangle the complex interactions underlying 
disease, and predict response to treatment. 
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Introduction 

Coronary artery disease (CAD) is a significant impediment to good health and productivity 
worldwide. Globally, CAD is the leading cause of death with 7 million deaths in 2011 
alone, accounting for 11.2% of all deaths [1]. Unfortunately the impact of CAD is 
increasing, as the disease burden is projected to nearly double from 47 million disability- 
adjusted life years (DALYs) in 1990 to 82 million DALYs in 2020 [2]. However, this impact 
is asymmetrically distributed between developed and developing countries. CAD morbidity 
is projected to more than double in developing countries from 1990 to 2020, but only 
increase by 50% in developed countries [3]. 

Development of CAD is a multi-decade process of atherosclerotic formation and chronic 
inflammation that ultimately leads to angina, myocardial infarction (MI), and death [4]. It 
arises from complex interactions of a multitude of factors - both environmental and genetic. 
Risk factors, such as tobacco use, low physical activity, obesity, hypertension, 
hypercholesterolemia, and diabetes have long been known. The contribution of genetics has 
been suspected given the importance of family history for CAD in an individual's current 
risk. This has been strengthened with data from Swedish twin studies demonstrating a 
substantially larger concordance rate for monozygotic twins compared to dizygotic twins 
and heritability of 0.57 for males and 0.38 for females [5,6]. However, identifying the 
specific genetic changes that modify CAD risk has been and still is a challenge despite 
intense investigation in recent years. 



Uncovering the Genetic Basis of CAD 

The publication of the Human Genome in 2001 offered an unprecedented launching pad for 
the understanding the genetic basis of diseases [7,8]. Within six years, seminal papers were 
published that began to outline the genetic architecture of CAD. These were initially based 
on univariate genome wide association studies (GWAS), where frequencies of individual 
SNPs in those with CAD (cases) were compared to those without CAD (controls). 
Although these GWAS datasets were significantly larger than what researchers were used to 
at that time, the statistical approach of testing individual SNPs for association was rather 
straightforward. One of the first studies considered 1,607 MI cases and 6,728 controls with 
no history of CAD from an Icelandic population [9]. This genome-wide scan found and 
replicated a locus associated with MI whose top SNP yielded an odds ratio of 1.28. 

In the same issue of Science, McPherson et al also found a locus associated with CAD in the 
Ottawa Heart Study that validated in both the Copenhagen City Heart Study and Dallas 
Heart Study [10]. The top SNP in the locus increased the risk of CAD by 15-20% in 
individuals who were heterozygous and 30-40% in those who were homozygous. Analysis 
of CAD cases from the Wellcome Trust Case Control Consortium [11] and German MI 
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Family Study revealed another SNP that was also strongly associated with CAD [12]. Each 
copy of the SNP allele increased CAD risk by 36%. 

Interestingly, these initial SNPs all mapped to chromosome band 9p2 1 , identifying it as the 
first locus to harbour common variants for CAD [13]. This locus was also replicated in other 
populations including Italian, Japanese, Korean, and Indian, suggesting it may be a 
universal risk allele [14-16]. 9p21 does not code for any known proteins but the closest 
genes, within 150kb, are CDKN2A and CDKN2B, which control cell proliferation and 
apoptosis. Moreover, targeted deletion of the non-coding interval of 9p21 in mice increased 
cardiac expression of CDKN2A and CDKN2B as well as vascular cell proliferation [17]. In 
addition, it has also been shown that 9p21 contains an abundance of enhancers and that the 
CAD risk SNPs disrupt STAT1 binding to one such enhancer thereby disrupting the 
interferon- y response [18]. The mediator between 9p21 and downstream genes may be 
ANRIL, a non-coding RNA in the region which can alter downstream gene expression 
[19,20]. 9p21 has also been found to be associated with coronary calcification levels, 
abdominal aortic aneurysms and intracranial aneurysms, suggesting a broader role for this 
variant in vessel function [21-24]. 

CAD has been associated with other loci in subsequent large-scale meta-analyses. A variant 
mapping to 6q25.1 increases CAD risk by 23% per allele [12]. Notably, this SNP is located 
within the MTHFD1L gene, which can influence plasma homocysteine levels and affect risk 
of CAD [11]. New loci have also been identified using haplotype analysis, whereby 
proximal SNPs are grouped together and treated as a single unit, thus enhancing the 
detection power. Re-analysis of the WTCCC dataset identified the 6q26-q27 locus, 
containing four SNPs within the SLC22A3-LPAL2-LPA gene cluster [25]. The LP A gene 
encodes for part of lipoprotein(a) (Lp(a)), whose plasma levels correlate with CAD 
pathogenesis [26,27]. Analysis of the haplotype determined that it accounted for 15% of 
Lp(a) plasma level variability. Soranzo et al also used haplotype analysis to identify 12q24 
as a CAD locus containing two SNPs yielding an odds ratio of 1.144 [28]. Interestingly, this 
locus also demonstrates considerably disease pleiotropy with type 1 diabetes, celiac disease 
and hypertension. 



"Missing" Heritability 

The initial univariate GWAS were able to identify common SNPs associated with CAD 
with low to moderate effect size [29]. However, these SNPs account for less than 10% of 
CAD heritability, and it has been posited that the remaining unexplained heritability may be 
due to rarer variants with larger effects and/ or undiscovered common variants with small 
effects [30-32]. These shortcomings suggested the need for larger data sets to allow for 
increased detection power. The CARDIoGRAM Consortium was formed to overcome 
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these issues by amalgamating 14 GWAS with 22,223 CAD cases and 64,762 controls of 
European ancestry [33]. This meta-analysis identified 13 new loci, with each allele 
increasing CAD risk by between 6% to 17% [34]. The C4D Genetics Consortium performed 
another large-scale analysis, where 8,424 cases of European ancestry and 6,996 cases of 
South Asian ancestry, along with 15,062 controls were analysed [35]. Five new loci were 
identified including one SNP of particular biological interest as the nearest gene, PDGFD 
(117kb downstream), encodes the platelet-derived growth factor D protein. This protein is 
suspected to promote pathogenesis of atherosclerotic plaques and the SNP's risk allele was 
associated with increased transcription of PDGFD in aortic media, aortic adventitia and 
mammary artery. Together, both CARDIoGRAM and C4D recently identified 46 genome- 
wide significant loci, explaining a little over 10% of CAD heritability [36]. While many 
CAD SNPs are still to be discovered, more collaborative GWAS with larger and more 
diverse sample populations are needed to further explain the remaining "missing" 
heritability. 



Constructing Genomic Risk Scores 

One of the potential benefits of genomic association studies is the use of an individual's 
genomic profile to create a score capturing the risk of developing CAD. Currently, the 
Framingham Risk Score (FRS) is a powerful tool that incorporates a patient's age, sex, 
cholesterol, smoking status, blood pressure, and diabetes status to generate a 10-year CAD 
risk prediction [37]. However, it can be the case that a majority of new CAD cases appear in 
individuals who are not in the highest FRS risk group [38]. Given that many of the FRS 
factors are dependent on age and/or have a genetic component, the use of genetic variants 
may allow for earlier and more efficient identification of those at risk for CAD before 
changes in other risk factors like blood pressure or plasma lipid levels occur. Therefore, a 
genomic risk score (GRS), comprised of the increased risk associated with genetic variants, 
can be calculated and used to stratify patients into different risk categories, as is done with 
the FRS; early stratification being crucial for maximum benefit from lifestyle modification 
or therapeutic intervention to stymie pathogenesis. 

As an initial investigation of the usefulness of a GRS with regard to individual risk 
prediction, Ripatti et al built a 13-SNP GRS for CAD and tested its utility in a prospective 
study of 30,725 individuals of European ancestry free of cardiovascular disease [39]. They 
found that patients with a GRS in the top 20% had a 66% increased risk (95% CI: 35-104%, 
P=10" 10 ). In another study, a very similar 13-SNP GRS was tested in 3,014 individuals from 
the Framingham Heart Study with an 11 -year median follow-up [40]. Analysis in this 
population found that each risk allele had a hazard ratio of 1.07 (95% CI: 1.00-1.15, 
P=0.04). While this GRS did not improve prediction of CAD over the FRS, it did associate 
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strongly with high coronary artery calcium levels, a marker for subclinical atherosclerosis, 
suggesting that these SNPs may be predictive of atherosclerotic plaque progression. 

In another prospective study of 4,818 Caucasian males [41], a 15-SNP GRS improved net 
reclassification (6.5%, P=0.044) and discrimination (1.11%, P=0.048) over FRS, indicating 
that there was a marginal improvement of identifying those at risk for CAD [42] . The largest 
benefit of the 15-SNP GRS was realized when analysis was restricted to men between 50 
and 59 years, where a 2.8% (P=0.0038) discrimination improvement over FRS occurred. 
Analysis of a 28-SNP GRS in a Finnish-based prospective cohort of 24,124 individuals also 
showed an improvement over FRS [43]. In this study, FRS provided a discrimination index 
of 0.851, while adding the GRS improved the index to 0.856 (P=0.0002). Moreover, 
addition of the GRS helped reclassify 12% of the individuals as high-risk, potentially 
identifying a group of patients that would benefit from early targeted healthcare. The value 
of a GRS also extends beyond individual risk prediction as it also captures a "best guess" of 
the genetic architecture of disease/trait, thus genetic overlaps between CAD traits can also 
be assessed [44] . 

While it is still early days, these GRS studies point to the potential use of SNPs in 
identifying and classifying individuals as high- and low-risk for CAD. It is also clear that 
GRS's based on lead SNPs from each associated locus use only a small fraction of the 
potential for prediction and more elaborate strategies and statistical methods, such as 
supervised learning, are needed to optimize the genomic predictions [45]. Further analyses 
using large numbers of genetic variants are needed to determine the effectiveness of a GRS 
as a screening tool and indicator for intervention, especially in or before the early stages of 
pathogenesis where FRS variables like age, cholesterol and blood pressure are less predictive 
and lifestyle modifications can have a greater impact. 



Network Approaches to Understand CAD Etiology 

Despite the success of GWAS, these results only provide one piece of the puzzle. It has been 
challenging to interpret the individual SNP associations in terms of the underlying 
biological pathways, and therefore current knowledge of disease etiology is still limited. To 
have a phenotypic effect, true causal genetic variants must exert their influence through 
other mediators, such as changes in the structure of protein encoded by the gene (e.g. non- 
synonymous variants) or changes in regulatory elements affecting the binding affinity of 
transcription factors, thus affecting the gene's RNA levels and presumably the levels of the 
encoded protein. Data about these biological phenomena were not available in many early 
studies, limiting the ability to decipher these complex effects. To address this shortcoming, 
studies now aim to combine various sources of systems level information, such as genomic, 
transcriptomic, and metabolomic data. Since both gene expression and metabolites are 
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known to be influenced by genetic variation, joint investigation of these genetic effects on 
gene expression and metabolite levels and on clinically-relevant phenotypes such as 
atherosclerosis affords insight into how the genetic effects on disease are mediated [46] . 

Beyond the ability to examine multiple sources of biological data at once, another important 
conceptual advance has been the shift towards examining genes as part of networks, such as 
transcriptional regulation networks [47], rather than considering them in isolation. A 
network-based analysis has the power to better model these biomolecular relationships and 
their potential role in disease. The network is the context: in some cases, the individual gene 
associations may be weak but subtle global trends may reveal themselves when examined at 
the level of multiple related genes. 

In a formal mathematical sense, a network consists of nodes (also called vertices) connected 
by edges (see Box 1). Typically, nodes represent some discrete entity, such as a gene, protein 
or metabolite, whereas edges represent relationships between entities. The edges may be 
undirected, representing a symmetric relationship such as "gene A is correlated with gene B" 
or directed, representing an asymmetric relationship such as "gene A up-regulates gene B". 
The edges need not represent purely physical relationships such as interactions; they may 
simply represent statistical effects such as correlation. One of the earliest and simplest 
applications in network analysis has been visualization of complex relationships between 
biological entities such as genes and genetic variants, allowing the researcher to construct a 
mental image of the underlying etiology. However, as the data becomes larger, more 
complex, and multi-faceted so the networks become less interpretable, requiring more 
sophisticated approaches for visualization and analysis in order to extract biological 
meaning from them. 

Researchers have long known that information sources such as transcriptomics can offer 
valuable insights into CAD etiology [48,49], as circulating leukocytes link organ systems in 
mediating the atherosclerotic process [50], however integration with other data sources 
including genetic and metabolite variation has only been possible relatively recently with the 
wide-spread use of high-throughput assays. One of the first examples of network-driven 
models of complex disease is that by Chen et al. [51], who examined genotypes, gene 
expression, and metabolite data from BxH-cross mice. They constructed networks of co- 
expressed genes and identified highly-connected components (modules) of these networks 
that were associated both with genetic variants and with several intermediate phenotypes 
related to obesity, diabetes and atherosclerosis, such as abdominal fat mass, weight, plasma 
insulin, free fatty acids, total plasma cholesterol and aortic lesion size. Using a statistical 
approach based on Mendelian Randomization [52] (see below), some of these sub-networks 
were then postulated as causal mediators between genetic variants and the disease-related 
traits. One of the modules with the strongest associations that was also supported to have a 
causal effect on metabolic traits was the macrophage-enriched metabolic network (MEMN) 



6 



Downloaded from http://biorxiv.org/on September 18, 2014 



consisting of 1,406 genes, out of which 375 genes were estimated to be causal of obesity- 
related traits. From these genes, it was hypothesized that Lpl and Lactb caused obesity and 
that Ppmll was a causal driver of phenotypes related to metabolic syndrome, predictions 
which were subsequently validated in mice [53]. Significantly, the MEMN network together 
with its link to obesity were also replicated in human adipose tissue [54]. Beyond the 
specific findings of these studies, these results highlight the power of integrating several 
sources of information and of considering networks rather than genes in isolation to uncover 
some of the biological processes connecting genetic variation with observable phenotypes. 



Mendelian Randomization for Predicting Causality 

Network analyses tend to rely on statistical association, such as correlation, to infer the 
network structure from observational data. In order to identify new drug targets and 
interventions that can reduce disease risk, we must identify which factors cause disease and 
which are merely associated with it, as association does not necessarily imply causation 
[55]. For example, if a metabolite is observed to be associated with disease, one cannot say 
that the metabolite is causal of the disease based on this observation alone. Such a claim can 
only be made using methods such as interventional experiments where the metabolite level 
is manipulated and the resulting effect on disease is observed. Yet, in some cases, statistical 
analyses of genomic variation may be able to determine which relationships are causal. 
Genetic variation is unique in that an individual's alleles are assigned randomly during 
meiosis (assuming non-assortative mating with respect to the phenotype of interest) and are 
largely fixed for the life of the individual. Consequently, genetic variants can be viewed as 
natural perturbations of the system (Mendelian Randomization [56,57]), allowing us to 
interpret a genetic-phenotypic association as causal in one direction, namely genetic variant 
causes phenotype. Leveraging this causal information through statistical frameworks such as 
instrumental variables (IVs) allows one to infer whether other associations with disease are 
indeed causal or simply correlations. Such an approach has provided evidence that increases 
in plasma HDL cholesterol do not lower the risk of myocardial infarction despite the well- 
documented strong negative correlation between the two in observational studies, thus drug 
targeting of HDL-C for CAD treatment may not be a successful strategy [58]. Applied on a 
larger scale, statistical methods based on the principles of Mendelian Randomization 
approaches can help orient the edges in an undirected gene or metabolite network (inferred 
from associations), generating directed networks representing putative causal structures 
[52,59,60] that can later be tested experimentally either in laboratory models or randomized 
controlled trials in humans. 
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Integrative Omics to Decipher the Role of Inflammation in CAD 

In humans, large-scale multi-omic datasets are being increasingly utilized to better elucidate 
the biological pathways responsible for the known links between inflammation and CAD 
[61]. Laurila et al. [62] analyzed Finnish individuals assayed for transcriptomics in fat 
tissue, HDL lipidomic profiles, and genotypes, in order to dissect the genetic contribution to 
levels of plasma high-density lipoprotein cholesterol (HDL-C), as HDL-C levels are known 
to be negatively associated with risk of CAD. The analysis compared multi-omic profiles of 
individuals at the extremes of HDL distribution and highlighted the role of the HLA region 
and of inflammation pathways in controlling HDL-C levels and a wide range of differences 
in both adipose transcriptome and lipidomic profiles. Another analysis of CAD case/control 
individuals from the Framingham Heart Study [63] revealed differences in co-expression 
patterns of genes between cases and controls; these differential modules were found to be 
enriched for quantitative trait loci (QTLs) that affect CAD risk, including a module enriched 
for B-cell immune genes. By integrating various data sources, including protein-protein 
interaction (PPI) networks and gene networks derived from other studies, they found several 
regulatory genes, dubbed key drivers, exhibiting strong effects on these modules. One such 
gene was TNFRSF13C, which is known to affect aortic root atherosclerosis, thereby 
providing support for the hypothesis that changes in co-regulation of B-cell-related gene 
networks, caused by such drivers, was partly responsible for increases in CAD risk. 

Expanding beyond the small set of metabolites traditionally associated with CAD (e.g. total 
LDL/HDL cholesterol or triglycerides), characterization of a wide-range of metabolite 
levels has become increasingly important in understanding of etiology CAD and related 
metabolic diseases [64,65]. Atherosclerotic plaques themselves are heterogeneous and 
composed of multiple immune cell types and a wide variety of fats, lipoproteins, and other 
metabolites [66,67]. Thus there is a need to include metabolomic variation together with 
genetic and transcriptomic effects. Such analyses have now become possible with 
technological advances in ! H NMR and high-resolution mass spectrometry, which routinely 
measure levels of hundreds of metabolite species in large human cohorts. For example, a 
recent metabolomic GWAS of 216 serum metabolite measures in over 8,300 Finnish 
individuals [68] identified 31 loci with genome -wide significant associations to metabolite 
levels. Integration of genomics, transcriptomic and metabolite data in a large Finnish cohort 
led to specific evidence for the role of inflammation in CAD through the identification of 
the Lipid-Leukocyte (LL) module [69] , a network of highly co-expressed genes related to the 
acute inflammatory response, which was inferred to be reactive to lipid levels through causal 
inference methods [59]. This analysis was later extended to assess a wide -range of 
metabolite species, revealing the effect of specific sub-species on the module's coherence, 
indicating that the degree of co-expression in the module was significantly associated with 
metabolite levels such as linoleic acid and various LDL and HDL particles [70] . Later, by 
integrating serum metabolomics data with genetic variation and transcriptomic data in 
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humans and in mice, Inouye et al. [46] used a powerful data-driven multivariate approach 
to identify 11 metabolic networks and detect 7 previously -unknown loci associated with 
serum metabolite levels, notably SERPINA1 and AQP9. Transcriptional data was then used 
to show that AQP9 expression in murine liver was associated with the size of atherosclerotic 
lesion and, in humans, both AQP9 and SERPINA1 expression in arterial tissue was 
substantially up-regulated in plaques, providing a potential explanatory link between genetic 
variation and a phenotype relevant to disease progression. 



New Data Sources to Model Disease 

Apart from genetic, transcriptomic, and metabolomic data, another rich source of 
information is the microbiome, the characterization of the diverse set of microbial 
communities inhabiting the human body, which are increasingly being recognized as 
important factors in obesity and atherosclerosis [71-73]. Recent evidence points to the 
presence of oral pathogens such as Chryseomonas, Veillonella, and Streptococcus in 
atherosclerotic plaques and thus potentially contributing to the inflammatory process 
[74,75]. A recent example of one such integrative study involving the gut microbiome, 
genetic variation, and gene expression in inbred mice strains (Hybrid Mouse Diversity 
Panel) [76] across several time points revealed the complex interactions between these 
components in contributing to obesity. Further, the effects of high-fat/high-sucrose diets on 
the composition of gut bacterial communities was modulated by the specific genetic makeup 
of each mouse strain, leading to down-stream effects on metabolism and eventual risk of 
obesity. 

Another potential source of useful information is epigenetics, which describes both 
epigenetic marks and non-coding RNAs (ncRNAs) [77,78]. Many epigenetic marks are 
thought to be reset during early embryogenesis; however, some epigenetic marks may still 
be passed between generations [79], and marks may be modified in response to 
environmental exposures such as smoking [80]. Some epigenetic effects relevant to CAD 
include homocysteine-induced methylation in vascular smooth muscle cells, contributing to 
atherosclerosis [81], and the ncRNA microRNA-33 found to regulate cholesterol 
homeostasis [82] . Further large-scale studies will be required in order to assay genome -wide 
methylation status and ncRNA expression levels in concert with other data sources. 



Future Directions 

Apart from incorporating novel sources of biological variation, one important factor not 
considered in many studies is time itself: the biological processes underlying cardiac and 
metabolic disease are dynamic, all while interacting with a complex array of environmental 
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and genetic effects. CAD may take several decades to manifest clinically, and is typically 
preceded by sub-clinical phenomena. Clearly, this disease progression can be influenced at 
various stages by external interventions such as life-style changes (diet and exercise) and 
medication such as statins, and disease trajectories vary widely between individuals. Hence, 
to better understand these processes larger and more detailed repeated measures data will be 
required. Such experiments present their own unique challenges in analysis and 
interpretation [83,84]. A recent multi-omic study examined whole-genome sequencing, 
RNA sequencing, proteomics, metabolomics, ncRNA, and auto-antibody data measured in 
blood components from one individual over a 14 month period [85], revealing in detail the 
correlation of multiple body systems like inflammatory and insulin response pathways to 
viral infection and early onset of type 2 diabetes. This study provides a glimpse into what 
will become more common, with costs of technology reducing to a level enabling such 
detailed measurements over much larger cohorts and longer time-scales. Similar studies 
with much larger sample sizes based on individuals with various disease states will be 
necessary to be able to draw robust conclusions about the pathogenesis of CAD and 
improve preclinical models. 

To complement temporal dynamics, spatial effects will need to be considered as well, that 
is, heterogeneity in expression across tissues [86,87]. Studies that consider only one tissue 
type, typically blood and its components, will not be able to detect such variation. This will 
require assays of multiple tissue types, both to capture coordinated processes happening in 
all tissues and to localize effects that occur only in specific tissues. One such effort is the 
Genotype-Tissue Expression (GTEx) project [88], which aims to conduct a wide-ranging 
survey of gene expression and its associated genetic variation across multiple human tissues. 

In addition to the important goals of predicting those individuals that are at high risk or 
disease and developing a deeper understanding of disease etiology, another major avenue of 
research where systems and network approaches may have an impact is pharmacogenomics, 
that is, the study of how genetic variation influences each individual's response to 
medication. Both drug response and drug metabolism can be considered as phenotypes, and 
many of the existing analysis methods are applicable. This will be useful for tailoring more 
specific sets of medication to each individual, matched to their genomic profile [89,90]. The 
clinical utility of pharmacogenomics approaches has so far been rather limited [91]. 
However, testing for loss-of-function variation in the gene CYP2C19 (part of the P450 family 
of enzymes) is now recommended for informing the use of the antiplatelet drug clopidogrel 
in individuals with acute coronary syndrome (ACS) undergoing percutaneous coronary 
intervention (PCI) [92], since such variants may adversely affect platelet activity and 
increase the risk of cardiovascular events. 
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Conclusions 

CAD and related diseases are complex phenotypes and are the result of interplay amongst a 
multitude of genetic and environmental effects together with an important inflammatory 
aspect. Initially, analyses tended to be restricted to only one type of high throughput data, 
such as GWAS considering association of disease with genetic markers, or only considering 
blood biomarkers for disease progression. While such focused analyses have proven 
insightful, they provide limited information about a narrow aspect of the disease, while 
ignoring the complex interactions between these different processes. The advent of multi- 
omic studies which concurrently analyze genetic variation, transcriptomics, metabolomics 
and others sources of information over hundreds or thousands of individuals, has begun 
providing deeper insight into the underlying mechanisms responsible for observed 
phenomena, such as the gene networks responsible for the known link between 
inflammation and disease and the various feedback loops amongst genetic variation, 
microbial communities, metabolism, and ultimately disease. 

Hence, the coming major challenges will firstly be collection of large-scale multi-omic 
samples across thousands of individuals in order to achieve the statistical power to detect 
the multitude of interacting effects in CAD etiology. Secondly, novel statistical and 
computational methods will be required to effectively combine these disparate information 
sources into a coherent model of disease, with the aim of generating plausible biological 
hypotheses that can be tested in animal models and clinical trials. Thirdly, the testing of these 
models will need to rigorously evaluate the specific reliability of the models as well as their 
modes of failure so that refinements to the model can be made. Meeting these challenges will 
result in greater understanding of CAD etiology and, if successfully implemented, translate into 
better clinical outcomes. 



11 



Downloaded from http://biorxiv.org/on September 18, 2014 



Figures 

Figure 1: Network-based analysis of omic data to model the processes connecting genetic 
variation to disease. 
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Table 1 : Molecular networks relevant to coronary artery disease 



Name 


Source 


Main associations 


Biological interpretation 


Ref 


Macrophage- 


BxH mouse cross, 


Obesity, diabetes, 


Network represents an 


\5U 
IV L i 


enriched 


TY11 1 1 tl tVI P flQQIIf**! 


a f Vi pttkH pro <;i ^ tora 1 


iTTflummatnTv TPsnniTsp Htivpti Viv 




metabolic 




cholesterol levels, and HDL 


genetic variation and affecting 




network 




levels 


disease phenotypes, in liver and/or 




fMFMNI 






adipose tissue. 






rluman 


VailOUb IIlcLaPUllLCi) ^JnJ_yJ_., 


1V10U.U.1C iiiipiicaLca d-LULc 


701 


lanlrn/ivffl II 1 1 

leuKocyte 


population-based 


VLDL, glycoproteins, 


inflammatory cells (mast cells and 




module 


cohort, whole 


isoleucine, and others) and 


basophils) as reactive to various 






blood 


niiiiiU-iic icbpuiioc iiiaiKcio 


IIlcLaPOllLCa. lllVCIbC ICla-LlOIlbllip 








f"TT -Ira and ripaw 


TiPttx^ppn tnnHnlp pynrpcemn atiH t1"C 








mnlppnlpiT wpicrhf admnnppfm^ 

lllUlvdilCll VV ^_ 1^4111 tl LI 1 L' \__» 1 1\_ \_ L 1 1 1 / 


HpTTsirv 




Combined 


Human cohort 


VCAM1 levels and SNPs 


Pathway represents an 


[62] 


Inflammatory 


enriched for low 


predictive of low HDL-C 


inflammatory link between genetic 




Pathway 


familial HDL-C, 




variation and HDL-C levels and 






adipose tissue 




indicates that HDL levels may be 










regulated by inflammatory 










processes. 




Two 


Human CAD 


Control module enriched for 


Differential pathways represent two 


[63] 


differential 


case/ control study, 


SNPs predictive of CAD. 


modes of regulation acting in CAD 




modules (case 


whole blood 




vs. non-CAD individuals, and 




and control) 






indicate that B-cell immune 










pathways may be causally driving 










lipids levels and risk of CAD. 





HDL: high density lipoprotein 

VLDL: very low density lipoprotein 

IgE: immunoglobulin E 

IL-lra: Interleukin 1 receptor antagonist 

CRP: C-reactive protein 

HMWA: high molecular weight adiponectin 

VCAM1: vascular cell adhesion molecule 1 
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